在本研究中,我们提出了一种基于病例的新型图像检索(SIR)方法,用于苏木精和曙红(H&E)染色的恶性淋巴瘤的组织病理学图像。当将整个幻灯片图像(WSI)用作输入查询时,希望能够通过重点关注病理上重要区域(例如肿瘤细胞)中的图像斑块来检索相似情况。为了解决这个问题,我们采用了基于注意力的多个实例学习,这使我们能够在计算案例之间的相似性时专注于肿瘤特异性区域。此外,我们采用对比度距离度量学习将免疫组织化学(IHC)染色模式纳入有用的监督信息,以定义异质性恶性淋巴瘤病例之间的适当相似性。在对249例恶性淋巴瘤患者的实验中,我们证实该方法比基线基于病例的SIR方法表现出更高的评估措施。此外,病理学家的主观评估表明,我们使用IHC染色模式的相似性度量适用于代表恶性淋巴瘤H&E染色组织图像的相似性。
translated by 谷歌翻译
Agents that can follow language instructions are expected to be useful in a variety of situations such as navigation. However, training neural network-based agents requires numerous paired trajectories and languages. This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks. The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data through the representation. Key challenges in applying the models to sequence-to-sequence tasks including instruction following are learning a shared representation of variable-length mulitimodal data and incorporating attention mechanisms. To address the problems, this paper proposes a novel network architecture to absorb the difference in the sequence lengths of the multimodal data. In addition, to further improve the performance, this paper shows how to incorporate the generative model-based approach with an existing semi-supervised method called a speaker-follower model, and proposes a regularization term that improves inference using unpaired trajectories. Experiments on BabyAI and Room-to-Room (R2R) environments show that the proposed method improves the performance of instruction following by leveraging unpaired data, and improves the performance of the speaker-follower model by 2\% to 4\% in R2R.
translated by 谷歌翻译
This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS). A seq2seq modeling approach that can simultaneously perform acoustic and temporal modeling is attractive. However, due to the difficulty of the temporal modeling of singing voices, many recent SVS systems with an encoder-decoder-based model still rely on explicitly on duration information generated by additional modules. Although some studies perform simultaneous modeling using seq2seq models with an attention mechanism, they have insufficient robustness against temporal modeling. The proposed attention mechanism is designed to estimate the attention weights by considering the rhythm given by the musical score. Furthermore, several techniques are also introduced to improve the modeling performance of the singing voice. Experimental results indicated that the proposed model is effective in terms of both naturalness and robustness of timing.
translated by 谷歌翻译
Content scanning systems employ perceptual hashing algorithms to scan user content for illegal material, such as child pornography or terrorist recruitment flyers. Perceptual hashing algorithms help determine whether two images are visually similar while preserving the privacy of the input images. Several efforts from industry and academia propose to conduct content scanning on client devices such as smartphones due to the impending roll out of end-to-end encryption that will make server-side content scanning difficult. However, these proposals have met with strong criticism because of the potential for the technology to be misused and re-purposed. Our work informs this conversation by experimentally characterizing the potential for one type of misuse -- attackers manipulating the content scanning system to perform physical surveillance on target locations. Our contributions are threefold: (1) we offer a definition of physical surveillance in the context of client-side image scanning systems; (2) we experimentally characterize this risk and create a surveillance algorithm that achieves physical surveillance rates of >40% by poisoning 5% of the perceptual hash database; (3) we experimentally study the trade-off between the robustness of client-side image scanning systems and surveillance, showing that more robust detection of illegal material leads to increased potential for physical surveillance.
translated by 谷歌翻译
Bayesian optimization~(BO) is often used for accelerator tuning due to its high sample efficiency. However, the computational scalability of training over large data-set can be problematic and the adoption of historical data in a computationally efficient way is not trivial. Here, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.
translated by 谷歌翻译
图神经网络(GNN)已成为与图形和类似拓扑数据结构有关的无数任务的骨干。尽管已经在与节点和图形分类/回归任务有关的域中建立了许多作品,但它们主要处理单个任务。在图形上的持续学习在很大程度上没有探索,现有的图形持续学习方法仅限于任务的学习方案。本文提出了一个持续学习策略,该策略结合了基于架构和基于内存的方法。结构学习策略是由强化学习驱动的,在该学习中,对控制器网络进行了这种方式,以确定观察到新任务时从基本网络中添加/修剪的最佳节点,从而确保足够的网络能力。参数学习策略的基础是黑暗体验重播方法的概念,以应对灾难性的遗忘问题。我们的方法在任务收入学习和课堂学习设置中都通过几个图的连续学习基准问题进行了数值验证。与最近发表的作品相比,我们的方法在这两种设置中都表明了性能的提高。可以在\ url {https://github.com/codexhammer/gcl}上找到实现代码。
translated by 谷歌翻译
我们提出XMEM,这是一种由Atkinson-Shiffrin Memory模型启发的统一功能存储器存储的长视频的视频对象分割体系结构。视频对象分割的先前工作通常仅使用一种类型的功能内存。对于超过一分钟的视频,单个功能内存模型紧密地链接了内存消耗和准确性。相比之下,遵循Atkinson-Shiffrin模型,我们开发了一种结构,该体系结构结合了多个独立但深厚的特征记忆存储:快速更新的感觉存储器,高分辨率的工作记忆和紧凑的长期记忆。至关重要的是,我们开发了一种记忆增强算法,该算法通常将主动使用的工作记忆元素合并为长期记忆,从而避免记忆爆炸并最大程度地减少长期预测的性能衰减。结合新的记忆阅读机制,XMEM在与最先进的方法(不适用于长视频上使用)相当的长视频时,XMEM大大超过了长效数据集上的最先进性能数据集。代码可从https://hkchengrex.github.io/xmem获得
translated by 谷歌翻译
最近的文本到语音(TTS)的质量与人类的质量相当。但是,其在口语对话中的应用尚未得到广泛研究。这项研究旨在实现与人类对话非常相似的TT。首先,我们记录并抄录实际自发对话。然后,提出的对话TTS分为两个阶段:第一阶段,各种自动编码器(VAE) - VITS或高斯混合物变化自动编码器(GMVAE) - 培训了训练,从端到端文本对语音(VIT),最近提出的端到端TTS模型。从语音中提取潜在的口语表示的样式编码器与TTS共同培训。在第二阶段,对风格预测指标进行了训练,以预测从对话历史中综合的说话风格。在推断期间,通过将样式预测器预测的语言样式表示为VAE/gmvae-vits,可以以适合对话背景的样式合成语音。主观评估结果表明,所提出的方法在对话级别的自然性方面优于原始VIT。
translated by 谷歌翻译
基于有效干预措施的早期疾病检测和预防方法正在引起人们的注意。机器学习技术通过捕获多元数据中的个体差异来实现精确的疾病预测。精确医学的进展表明,在个人层面的健康数据中存在实质性异质性,并且复杂的健康因素与慢性疾病的发展有关。但是,由于多种生物标志物之间的复杂关系,确定跨疾病发作过程中的个体生理状态变化仍然是一个挑战。在这里,我们介绍了健康疾病阶段图(HDPD),它通过可视化在疾病进展过程早期波动的多种生物标志物的边界值来代表个人健康状态。在HDPD中,未来的发作预测是通过扰动多个生物标志物值的情况来表示的,同时考虑变量之间的依赖性。我们从3,238个个体的纵向健康检查队列中构建了11种非传染性疾病(NCD)的HDPD,其中包括3,215个测量项目和遗传数据。 HDPD中非发病区域的生物标志物值的改善显着阻止了11个NCD中的7个未来的疾病发作。我们的结果表明,HDPD可以在发作过程中代表单个生理状态,并用作预防疾病的干预目标。
translated by 谷歌翻译
在本文中,我们专注于使用神经网络的时间序列数据的生成。通常情况下,输入时间序列数据仅实现了一个(通常是不规则采样)路径,这使得很难提取时间序列特征,并且其噪声结构比I.I.D更为复杂。类型。时间序列数据,尤其是来自水文,电信,经济学和金融的数据,也表现出长期记忆,也称为长期依赖性(LRD)。本文的主要目的是在神经网络的帮助下人为地生成时间序列,并考虑到路径的LRD。我们提出了FSDE-NET:神经分数随机微分方程网络。它通过使用大于一半的HURST索引的分数Brownian运动来概括神经随机微分方程模型,该方程式大于一半。我们得出FSDE-NET的求解器,并理论上分析了FSDE-NET溶液的存在和唯一性。我们对人工和实时序列数据进行的实验表明,FSDE-NET模型可以很好地复制分布属性。
translated by 谷歌翻译